NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

DifferentialRegulation : a Bayesian hierarchical approach to identify differentially regulated genes

https://doi.org/10.1093/biostatistics/kxae017

Tiberi, Simone; Meili, Joël; Cai, Peiying; Soneson, Charlotte; He, Dongze; Sarkar, Hirak; Avalos-Pacheco, Alejandra; Patro, Rob; Robinson, Mark D (June 2024, Biostatistics)

Summary Although transcriptomics data is typically used to analyze mature spliced mRNA, recent attention has focused on jointly investigating spliced and unspliced (or precursor-) mRNA, which can be used to study gene regulation and changes in gene expression production. Nonetheless, most methods for spliced/unspliced inference (such as RNA velocity tools) focus on individual samples, and rarely allow comparisons between groups of samples (e.g. healthy vs. diseased). Furthermore, this kind of inference is challenging, because spliced and unspliced mRNA abundance is characterized by a high degree of quantification uncertainty, due to the prevalence of multi-mapping reads, ie reads compatible with multiple transcripts (or genes), and/or with both their spliced and unspliced versions. Here, we present DifferentialRegulation, a Bayesian hierarchical method to discover changes between experimental conditions with respect to the relative abundance of unspliced mRNA (over the total mRNA). We model the quantification uncertainty via a latent variable approach, where reads are allocated to their gene/transcript of origin, and to the respective splice version. We designed several benchmarks where our approach shows good performance, in terms of sensitivity and error control, vs. state-of-the-art competitors. Importantly, our tool is flexible, and works with both bulk and single-cell RNA-sequencing data. DifferentialRegulation is distributed as a Bioconductor R package.
more » « less
Full Text Available
Alevin-fry unlocks rapid, accurate and memory-frugal quantification of single-cell RNA-seq data

https://doi.org/10.1038/s41592-022-01408-3

He, Dongze; Zakeri, Mohsen; Sarkar, Hirak; Soneson, Charlotte; Srivastava, Avi; Patro, Rob (March 2022, Nature Methods)

Full Text Available
Preprocessing choices affect RNA velocity results for droplet scRNA-seq data

https://doi.org/10.1371/journal.pcbi.1008585

Soneson, Charlotte; Srivastava, Avi; Patro, Rob; Stadler, Michael B. (January 2021, PLOS Computational Biology)
Li, Min (Ed.)
Experimental single-cell approaches are becoming widely used for many purposes, including investigation of the dynamic behaviour of developing biological systems. Consequently, a large number of computational methods for extracting dynamic information from such data have been developed. One example is RNA velocity analysis, in which spliced and unspliced RNA abundances are jointly modeled in order to infer a ‘direction of change’ and thereby a future state for each cell in the gene expression space. Naturally, the accuracy and interpretability of the inferred RNA velocities depend crucially on the correctness of the estimated abundances. Here, we systematically compare five widely used quantification tools, in total yielding thirteen different quantification approaches, in terms of their estimates of spliced and unspliced RNA abundances in five experimental droplet scRNA-seq data sets. We show that there are substantial differences between the quantifications obtained from different tools, and identify typical genes for which such discrepancies are observed. We further show that these abundance differences propagate to the downstream analysis, and can have a large effect on estimated velocities as well as the biological interpretation. Our results highlight that abundance quantification is a crucial aspect of the RNA velocity analysis workflow, and that both the definition of the genomic features of interest and the quantification algorithm itself require careful consideration.
more » « less
Full Text Available
Alignment and mapping methodology influence transcript abundance estimation

https://doi.org/10.1186/s13059-020-02151-8

Srivastava, Avi; Malik, Laraib; Sarkar, Hirak; Zakeri, Mohsen; Almodaresi, Fatemeh; Soneson, Charlotte; Love, Michael I.; Kingsford, Carl; Patro, Rob (December 2020, Genome Biology)
null (Ed.)
Abstract Background The accuracy of transcript quantification using RNA-seq data depends on many factors, such as the choice of alignment or mapping method and the quantification model being adopted. While the choice of quantification model has been shown to be important, considerably less attention has been given to comparing the effect of various read alignment approaches on quantification accuracy. Results We investigate the influence of mapping and alignment on the accuracy of transcript quantification in both simulated and experimental data, as well as the effect on subsequent differential expression analysis. We observe that, even when the quantification model itself is held fixed, the effect of choosing a different alignment methodology, or aligning reads using different parameters, on quantification estimates can sometimes be large and can affect downstream differential expression analyses as well. These effects can go unnoticed when assessment is focused too heavily on simulated data, where the alignment task is often simpler than in experimentally acquired samples. We also introduce a new alignment methodology, called selective alignment, to overcome the shortcomings of lightweight approaches without incurring the computational cost of traditional alignment. Conclusion We observe that, on experimental datasets, the performance of lightweight mapping and alignment-based approaches varies significantly, and highlight some of the underlying factors. We show this variation both in terms of quantification and downstream differential expression analysis. In all comparisons, we also show the improved performance of our proposed selective alignment method and suggest best practices for performing RNA-seq quantification.
more » « less
Full Text Available
Tximeta: Reference sequence checksums for provenance identification in RNA-seq

https://doi.org/10.1371/journal.pcbi.1007664

Love, Michael I.; Soneson, Charlotte; Hickey, Peter F.; Johnson, Lisa K.; Pierce, N. Tessa; Shepherd, Lori; Morgan, Martin; Patro, Rob; Pertea, Mihaela (February 2020, PLOS Computational Biology)

Full Text Available
A junction coverage compatibility score to quantify the reliability of transcript abundance estimates and annotation catalogs

https://doi.org/10.26508/lsa.201800175

Soneson, Charlotte; Love, Michael I; Patro, Rob; Hussain, Shobbir; Malhotra, Dheeraj; Robinson, Mark D (January 2019, Life Science Alliance)

Most methods for statistical analysis of RNA-seq data take a matrix of abundance estimates for some type of genomic features as their input, and consequently the quality of any obtained results is directly dependent on the quality of these abundances. Here, we present the junction coverage compatibility score, which provides a way to evaluate the reliability of transcript-level abundance estimates and the accuracy of transcript annotation catalogs. It works by comparing the observed number of reads spanning each annotated splice junction in a genomic region to the predicted number of junction-spanning reads, inferred from the estimated transcript abundances and the genomic coordinates of the corresponding annotated transcripts. We show that although most genes show good agreement between the observed and predicted junction coverages, there is a small set of genes that do not. Genes with poor agreement are found regardless of the method used to estimate transcript abundances, and the corresponding transcript abundances should be treated with care in any downstream analyses.
more » « less
Full Text Available
RNA Sequencing Data: Hitchhiker's Guide to Expression Analysis

https://doi.org/10.1146/annurev-biodatasci-072018-021255

Van den Berge, Koen; Hembach, Katharina M.; Soneson, Charlotte; Tiberi, Simone; Clement, Lieven; Love, Michael I.; Patro, Rob; Robinson, Mark D. (July 2019, Annual Review of Biomedical Data Science)

Gene expression is the fundamental level at which the results of various genetic and regulatory programs are observable. The measurement of transcriptome-wide gene expression has convincingly switched from microarrays to sequencing in a matter of years. RNA sequencing (RNA-seq) provides a quantitative and open system for profiling transcriptional outcomes on a large scale and therefore facilitates a large diversity of applications, including basic science studies, but also agricultural or clinical situations. In the past 10 years or so, much has been learned about the characteristics of the RNA-seq data sets, as well as the performance of the myriad of methods developed. In this review, we give an overview of the developments in RNA-seq data analysis, including experimental design, with an explicit focus on the quantification of gene expression and statistical approachesfor differential expression. We also highlight emerging data types, such as single-cell RNA-seq and gene expression profiling using long-read technologies.
more » « less
Full Text Available
Swimming downstream: statistical analysis of differential transcript usage following Salmon quantification

https://doi.org/10.12688/f1000research.15398.3

Love, Michael I.; Soneson, Charlotte; Patro, Rob (January 2018, F1000Research)

Detection of differential transcript usage (DTU) from RNA-seq data is an important bioinformatic analysis that complements differential gene expression analysis. Here we present a simple workflow using a set of existing R/Bioconductor packages for analysis of DTU. We show how these packages can be used downstream of RNA-seq quantification using the Salmon software package. The entire pipeline is fast, benefiting from inference steps by Salmon to quantify expression at the transcript level. The workflow includes live, runnable code chunks for analysis using DRIMSeq and DEXSeq, as well as for performing two-stage testing of DTU using the stageR package, a statistical framework to screen at the gene level and then confirm which transcripts within the significant genes show evidence of DTU. We evaluate these packages and other related packages on a simulated dataset with parameters estimated from real data.
more » « less
Full Text Available

Search for: All records